HPAVC

home *** CD-ROM | disk | FTP | other *** search

/ HPAVC / HPAVC CD-ROM.iso / WINER.ZIP / CHAP3.TXT < prev next >

Wrap

Text File | 1994-09-04 | 112KB | 2,131 lines

CHAPTER 3 PROGRAMMING METHODS In Chapters 1 and 2 you learned how the BASIC compiler translates a source file into the equivalent assembly language statements, and how it allocates memory to store variables and constants. In particular, you saw that the BC compiler generates assembly language code directly for some statements, while for others it creates calls to routines in the BASIC libraries. Most of the code examples presented in that chapter dealt with simple variable assignments and calculations. Of course, the compiler must do much more than merely assign and manipulate variables and other data. Equally important is controlling how your program operates, and determining which paths are to be taken as it progresses. In this chapter we will delve into the inner workings of control flow structures, with an eye toward writing programs that are as efficient as possible. As with the earlier chapters, this discussion includes numerous disassemblies of compiled BASIC code. Thus, you will see exactly what the compiler does, and how each control flow statement is handled. This chapter also discusses the design of both static and non-static subprograms and functions, and compares the relative merits of each method. Many programmers do not fully understand the term Static, and find the related subject of recursive subroutines especially difficult to grasp. BASIC supports four types of subroutines, and each will be described in this chapter: GOSUB routines, subprograms, DEF FN functions, and what I call "formal functions". YOu will notice that I use the terms subroutine and procedure interchangeably, to indicate a single block of code that may be executed more than once. You will also learn how parameters are passed to these procedures. Finally, in this chapter I will discuss programming style. Programming in any language is arguably as much of an art as it is a science. But unlike, say, music, where a composer can write any sequence of notes and proclaim them acceptable, a computer program must at least work correctly. There are an infinite number of ways to accomplish any programming task, and I can make recommendations only. Which approach you choose will reflect both your own personal taste and style, as well as your current level of competence and understanding of programming in general. CONTROL FLOW ============ All programs--regardless of the language in which they are written--require a mechanism for testing certain conditions and then performing different actions based on those conditions. Although there are many ways to perform tests and branches in a BASIC program, all of them do essentially the same thing. The BASIC control flow statements are GOTO, DO/LOOP, WHILE/WEND, IF/THEN/ELSE, FOR/NEXT, SELECT CASE, ON GOTO, and ON GOSUB. Because the capabilities of WHILE/WEND are also available with a DO/LOOP construct, the two will be discussed together. In almost all cases, the BASIC compiler directly generates the code that controls a program's flow. One exception is when floating point values are used as a FOR counter, or as a WHILE or UNTIL condition. In those situations, calls are made to the floating point comparison routines in the BASIC runtime library. Another place is when you have a statement such as CASE ASC(X$), or IF LEFT$(X$, 10) = Y$. ASC and LEFT$ are also subroutines in the BASIC language library, and they too are invoked by calls. It is important to reiterate that when dealing with integer test conditions, BC will in many cases create assembly language code that is as good as a human programmer would write. In the short program fragment that follows, all of the BASIC source code is shown translated to the equivalent assembly language statements. This listing was derived by compiling and linking the BASIC program for Microsoft CodeView, and then using CodeView to display the resultant code. This is what you write: DO X% = X% + 1 LOOP WHILE X% < 100 This is the result after compilation: 30: INC WORD PTR [X%] ;X% = X% + 1 CMP WORD PTR [X%],64 ;compare X% to 100 JL 30 ;jump if less to 30 Here the variable X% is incremented, and then compared to the value 100. (64 is the Hex equivalent to 100, which is how CodeView displays values.) If X% is indeed less than 100, the program jumps back to address 30 and continues processing the loop. Notice that while this example does not use a named label in the BASIC source code as the target for a GOTO, the equivalent assembly language code does. In this case, the label is the code at address 30. Do not confuse the addresses that assembly language must use as jump targets with the numbered labels that in BASIC are optional. THE DREADED GOTO Modern programming philosophy dictates that GOTO and GOSUB statements should be avoided at all cost, in favor of DO and WHILE loops. However, all of these methods result in nearly identical code. Indeed, there is nothing inherently wrong with using GOTO when circumstances warrant it. By examining the program listing below, you will see that BASIC generates code that is identical for a GOTO as for a DO loop. This is what you write: Label: X% = X% + 1 IF X% < 100 THEN GOTO Label This is the result after compilation: 30: INC WORD PTR [X%] ;X% = X% + 1 CMP WORD PTR [X%],64 ;compare X% to 100 JL 30 ;jump if less to 30 Since GOTO and DO/LOOP produce the same results, which one is better, and why? In general, a DO/LOOP is preferable for two reasons. First, it is a nuisance to have to create a new and unique label name for every location that a program may need to branch to. Admittedly, in a short program this will not be a problem. But in a large application with many small loops that test for keyboard input, you end up creating many labels with names such as GetKey1, GetKey2, and so forth. And if you inadvertently use the wrong label name, your program will not work correctly. More important, however, is that for each label you define in a program, the BC compiler must remember its name and the equivalent address in the object code that the label identifies. Since label names can be as long as 40 characters and memory addresses require 2 bytes each to identify, a finite number of label names can be accommodated. By avoiding unnecessary labels, you are giving BC that much more memory to use for compiling your program. There are several situations in which GOTO is preferable to a DO or WHILE loop. Indeed, one of my personal pet peeves is when a programmer tries to shoehorn structure into a program no matter what the cost. Consider the three different code fragments below; each waits for a key press and then assigns it to the variable Ky$. This approach is the worst: Ky$ = "" WHILE Ky$ = "" Ky$ = INKEY$ WEND This method is better: Label: Ky$ = INKEY$ IF Ky$ = "" GOTO Label And this is better still: DO Ky$ = INKEY$ LOOP WHILE Ky$ = "" In the first example, an extra step is needed solely to clear Ky$ to a null string, so the initial WHILE will be true and execute at least once. Every string assignment adds 13 bytes to a program, and those 13 bytes can add up quickly in a large application. The second example avoids the unnecessary assignment, but adds a label for GOTO to jump to. Although this label does require a small amount of additional memory while the program is being compiled, it does not increase the size of the final executable program file. The last example is better still, because it avoids the need for a line label and also avoids an extra string assignment. Since a DO loop allows the test to be placed at either the top or bottom of the loop, you can force the loop to be executed at least once by putting the test at the bottom as shown here. However, even this can be improved upon by eliminating the string comparison that checks if Ky$ is equal to a null string. If we replace LOOP WHILE Ky$ = "" with LOOP UNTIL LEN(Ky$), only 13 bytes of code are generated instead of 15. When two strings are compared (Ky$ and ""), each must be passed to the string comparison routine. Since LEN requires only one argument, the code to pass the second parameter is avoided. There are some situations for which the GOTO is ideally suited. In the first two examples below, a complex expression is used as the condition for executing a DO WHILE loop, and the same expression is then used again within the loop. DO WHILE (X% + Y%) * Z% > 13 IF (X% + Y%) * Z% = 100 THEN PRINT ... ... LOOP DO WHILE ASC(MID$(S$, A%, B%)) > 13 IF ASC(MID$(S$, A%, B%)) > 100 THEN PRINT ... ... LOOP Label: Temp% = ASC(MID$(S$, A%, B%)) IF Temp% > 13 THEN IF Temp% > 100 THEN PRINT ... ... GOTO Label END IF In the first example, BASIC remembers the results of its test that checks if a (X% + Y%) * Z% is greater than 13, and it uses the result it just calculated in the next test that compares the same expression to 100. This is one more example of the kinds of optimizations BC performs as it compiles your programs. String expressions such as those used in the second example are of necessity more complex, and require calls to library routines. With this added complexity, BASIC unfortunately cannot retain the result of the earlier comparison, and it generates identical code a second time. A more elegant solution in this case is therefore the GOTO as shown in the last example. Because the result of evaluating the expression is saved manually, it may be reused within the loop. As proof, the second DO WHILE example above requires 73 bytes to implement, as opposed to only 53 when Temp% and GOTO are used. I should also point out that the most common and valuable use for GOTO is to get out of a deeply nested series of IF or other blocks of code. It is not uncommon to have a FOR/NEXT loop that contains a SELECT CASE block, and within that a series of IF/ELSE tests. The only way to jump out of all three levels at once is with a GOTO. FOR/NEXT LOOPS Unlike WHILE and DO loops that can test for nearly any condition and at either the top or bottom of the loop, a FOR/NEXT loop is intended to perform a block of statements a fixed number of times. A FOR/NEXT loop could also be replaced with code that compares a value and uses GOTO to reenter the loop if needed, but that is hardly necessary. My point is to yet again illustrate that all of BASIC's seemingly fancy constructs are no more than tests and GOTOs deep down at the assembly language level. A FOR/NEXT loop determines the number of iterations that will be executed once ahead of time, before the loop begins. For example, the listing below shows a loop that changes the upper limit inside the loop. However the loop still executes 10 times. Limit% = 10 FOR X% = 1 TO Limit% Limit% = 5 PRINT Limit% NEXT The code that BASIC produces for the FOR/NEXT loop in the previous example is translated to the following equivalent during the compilation process. Limit% = 10 Temp% = Limit% X% = 1 GOTO Next: For: Limit% = 5 PRINT Limit% X% = X% + 1 Next: IF X% <= Temp% THEN GOTO For Please understand that changing a loop condition inside the loop is considered bad practice, because the program becomes difficult to understand. If you really need to alter the limit inside a loop, the loop should be recoded to use WHILE or DO instead. Another good reason for avoiding such code is because it is possible that future versions of BASIC will behave differently than the one you are using now. If Microsoft were to modify BASIC such that the limit condition were reevaluated at the NEXT statement, your code would no longer work. It is also considered bad practice to modify the loop counter variable itself (X% in the previous examples). However, this causes no real harm, and you should not be afraid to do that if the situation warrants it. Of course, changing the loop counter will affect the number of times the loop is executed. IF/THEN/ELSE AND SELECT CASE BASIC provides two methods for testing conditions in a program, and executing different blocks of code based on the result. The most common method is the IF test, which can be used on a single variable, the result of an expression, the returned value from a function, or any combination of these. I won't belabor the most common uses for IF here, but I do want to point out some of its less obvious properties. Also, there are some situations where IF and ELSEIF are appropriate, and others where their counterpart, SELECT CASE, is better. As you have already learned, a simple IF test will in most cases be translated into the equivalent assembler instructions directly. In some cases, however, the condition you specify is tested, while in others the *opposite* condition is tested. If you say IF X > 10 THEN GOTO Label, BASIC may change that to IF X <= 10 GOTO [next statement]. Which BASIC uses depends on what you will do if the condition is true, and how far away in the generated code the statements that will be executed are located. When a GOTO is to be performed if the test passes, then the relative position of the target label is also a factor. A jump to a location either ahead in the code or more than 128 bytes backwards requires BASIC to generate more code. The 128 byte displacement is significant, because the 80x86 can perform a *conditional jump* to an address only a limited distance away. That is, after a comparison is made, the target address for a conditional jump such as "Jump if Greater" must be no more than that many bytes distant. However, an unconditional jump can be to any address within the same 64K code segment. (Bear with me for a moment, because the significance of this will soon become apparent.) This is shown in the next listing following. IF X% = 100 THEN CMP Word Ptr [X%],64 ;compare X% to 100 JE 003A ;jump ahead if equal JMP Label ;else, skip ahead 003A: ;BASIC made this label Y% = 2 MOV Word Ptr [Y%],2 END IF Label: IF X > 8 GOTO Label CMP Word Ptr [X%],8 ;compare X% to 8 JG Label ;jump back if greater In the first example above, BASIC compares the value of X% to 100 (64 Hex), and if equal jumps ahead to a label it created at address 003A Hex. Otherwise, a jump is made to the next statement in the program, which in this case is a named label. Although using two jumps may seem unnecessarily convoluted, it is necessary because BASIC has no way of knowing how many statements will follow at the time it compiles the IF test. Thus, it also cannot know whether the statement following the END IF will end up being 128 or more bytes ahead. By jumping to another, unconditional jump, BC is assured that the generated code will be legal. (When BC finally encounters the END IF, it goes back to the code it created earlier, and completes the portion of the unconditional jump instruction that tells how far to go.) Some compilers avoid this situation and create the longer, two-jump code on a trial basis, but then go back and change it to the shorter form if possible. These are called two-pass compilers, because they process your source code in two phases. Unfortunately, current versions of Microsoft BASIC do not use more than one pass. In the second example Label has already been encountered, and BC knows that the label is within 128 bytes. Therefore, it can translate the IF statement directly, without having to conditionally jump to yet another jump. Had the earlier label been farther away, though, an extra jump would have been needed. It is important to understand that forward jumps are always handled with more code than is likely necessary, because BASIC does not know how far ahead the jump must go. In fact, this same issue must be dealt with when writing in assembly language, since the conditional jump distance limitation is inherent in the 80x86 microprocessor. The bottom line, therefore, is that you can in many cases reduce the size of your programs by controlling in which direction a conditional jump will be performed. For example, almost all programs must at some point sit in a loop waiting until a key is pressed. The next listing shows two common ways to do this, with one testing for a key press at the top of the loop, and the other doing the test at the bottom. DO UNTIL LEN(INKEY$) ;this comprises 18 bytes 0030: CALL B$INKY ;call INKEY$ PUSH AX ;pass the result to LEN CALL B$FLEN ;AX now holds the length AND AX,AX ;see if it's zero JZ 0042 ;yes, jump to LOOP JMP 0044 ;no, jump out of loop 0042: LOOP JMP 0030 ;jump back to DO 0044: DO ;this is only 15 bytes LOOP UNTIL LEN(INKEY$) CALL B$INKY ;call INKEY$ PUSH AX ;as above CALL B$FLEN AND AX,AX JZ 0044 ;jump back if zero Viewed from a purely BASIC perspective, these two examples operate identically. But as you can see, the code that BASIC creates is more efficient for the second example. When BASIC encounters the first DO statement, it has no idea how many more statements there will be until the terminating LOOP. Therefore, it has no recourse but to create an extra jump. In the second example, the location of the DO is already known to be within 128 bytes, so the LOOP test can branch back using the shorter and more direct method. An ELSEIF statement block is handled in a similar fashion, with code that directly compares each condition and branches accordingly. Because the code to be executed if the IF is true is always after the IF test itself, the less efficient two-jump code must be generated. A simple IF/ELSEIF follows, shown as a mix of BASIC and assembly language statements. IF X% > 9 THEN CMP Word Ptr [X%],9 ;compare X% to 9 JG 003A ;assign Y% if greater JMP 0043 ;else jump to next test 003A: Y% = 1 MOV Word Ptr [Y%],1 ;assign Y% JMP 0066 ;jump out of the block ELSEIF X% > 5 THEN 0043: CMP Word Ptr [X%],5 ;as above JG 004D JMP 0066 004D: Y% = 2 MOV Word Ptr [Y%],2 END IF 0066: ... ... Aside from the additional jumping over jumps that are added to all forward address references, this code is translated quite efficiently. In this situation, the compiled output is identical to that produced had SELECT CASE been used. However, there is one important situation in which SELECT CASE is more efficient than IF and ELSEIF. For each ELSEIF test condition, code is generated to create a separate comparison. When a simple comparison such as X% > 9 is being made, only one assembly language statement is needed. But when an expression is tested--for example, ABS((X% + Y%) * Z%)) > 9--identical code is generated repeatedly. This is illustrated in the listing that follows. IF ABS((X% + Y%) * Z%) = 5 THEN A% = 1 ELSEIF ABS((X% + Y%) * Z%) = 6 THEN A% = 2 ELSEIF ABS((X% + Y%) * Z%) = 7 THEN A% = 3 END IF Each time BC encounters the expression ABS((X% + Y%) * Z%), it duplicates the same assembly language statements. But when SELECT CASE is used, the expression is evaluated once, and used for each subsequent test. The first example in the next listing shows how SELECT CASE could be used to provide the same functionality as the preceding IF/ELSEIF block, but with much less code. The second example then shows what SELECT CASE really does, using an IF/ELSEIF equivalent. You write it this way: SELECT CASE ABS((X% + Y%) * Z%) CASE 5: A% = 1 CASE 6: A% = 2 CASE 7: A% = 3 CASE ELSE END SELECT BASIC really does this: Temp% = ABS((X% + Y%) * Z%) IF Temp% = 5 THEN A% = 1 ELSEIF Temp% = 6 THEN A% = 2 ELSEIF Temp% = 7 A% = 3 END IF As you can see, SELECT CASE evaluates the expression once, stores the result in a temporary variable, and then uses that variable repeatedly for all subsequent comparisons. Therefore, when the same expression is to be tested multiple times, SELECT CASE will be more efficient than IF and ELSEIF. This is also true for string expressions and other functions. For example, SELECT CASE LEFT$(Work$, 10) will result in less code and faster performance than using IF and ELSEIF with that same expression more than once. Another important feature of SELECT CASE is its ability to use either variable or constant test conditions, and to operate on a range of values. For example, the C language Switch statement which is the equivalent of BASIC's SELECT CASE can use only constant numbers for each test. BASIC is particularly powerful in this regard, and allows any legal expression for each CASE condition. For example, CASE IS > (Y AND Z) is valid, and so is CASE 0 TO Max. CASE also accepts multiple conditions separated by commas such as CASE 1, 3, 4 TO 100, -10 TO -1. In this case, the statements that follow will be executed if the selected expression equals 1, 3, any value between 4 and 100 inclusive, or any value between -10 and -1 inclusive. It is also worth mentioning here that QuickBASIC version 4.0 contains an interesting and irritating quirk that requires a CASE ELSE in the event that none of the tests match. Had the CASE ELSE been omitted from the previous example and the value of the expression was not between 5 and 7, QuickBASIC 4.0 would issue a "CASE ELSE expected" error at run time. Fortunately, this has been repaired in QuickBASIC 4.5 and later versions. Notice that this is not a bug in QuickBASIC. Rather, it is the behavior described in the ANSI (American National Standards Institute) specification for BASIC. At the time QuickBASIC 4.0 was introduced, Microsoft mistakenly believed the then-proposed ANSI standard for BASIC would be significant. As that standard approached fruition, it became clear to Microsoft that the only standard most programmers really cared about was Microsoft's. One final point I cannot make often enough is the inherent efficiency of integer operations and comparisons. This is especially true in the comparisons that are made in both IF and CASE tests. In the first example below, each of the characters in a string is tested in turn. The second example shows a much better way to write such a test, by obtaining the ASCII value once and using that for subsequent integer comparisons. Not recommended: FOR X = 1 TO LEN(Work$) SELECT CASE MID$(Work$, X, 1) CASE CHR$(9): PRINT "Tab key" CASE CHR$(13): PRINT "Enter key" CASE CHR$(27): PRINT "Escape key" CASE "A" TO "Z", "a" TO "z": PRINT "Letter" CASE "0" TO "9": PRINT "Number" END SELECT NEXT Much more efficient: FOR X = 1 TO LEN(Work$) SELECT CASE ASC(MID$(Work$, X, 1)) CASE 9: PRINT "Tab key" CASE 13: PRINT "Enter key" CASE 27: PRINT "Escape key" CASE 65 TO 90, 97 TO 122: PRINT "Letter" CASE 48 TO 57: PRINT "Number" END SELECT NEXT In the first program the SELECT itself generates 27 bytes, which is comprised of a call to the MID$ function and then a call to the string assign routine. A string assignment is needed to save the MID$ result in a temporary variable for the subsequent tests that follow. Each CASE test that uses CHR$ adds 27 bytes, and this includes the call to CHR$ as well as an additional call to the string comparison routine. Testing for the letters adds 75 bytes, and testing for the numbers adds 39 more. This results in a total code size of 222 bytes, not counting the FOR/NEXT loop. Contrast that with only 131 bytes for the second example, in which the SELECT portion requires only 26 bytes. Although an extra call is needed to obtain the ASCII value of the extracted character, the lack of a subsequent string assignment more than makes up for that. Further, the tests for 9, 13, and 27 require only 13 bytes each, compared to 27 when CHR$ values were used. The letters test requires 43 bytes, and the numbers test only 23. Clearly this is a significant improvement, especially in light of the small number of tests that are being performed here. In a real program that performs hundreds of string comparisons, replacing those with integer comparisons where appropriate will yield a substantial size reduction. AND, OR, EQV, and XOR When you use AND or OR in an IF test, what is really being compared is either 0 or -1. That is, BASIC evaluates the *truth* of each expression being tested on both sides of the AND or OR, and a truth in BASIC always results in one or the other of these values. Once each expression has been evaluated, the results are combined using an assembly language AND or OR instruction, and a branch is then made accordingly. Remember that when integers are treated as unsigned, setting all of the bits to 1 results in a value of -1. In chapter 2 I showed how the various logical operators are used to manipulate bits in an integer or long integer variable. The concept is identical when these operators are used for decision-making in a BASIC program. The difference is really more a matter of semantics than definition. That is, the same bit manipulation is performed, only in this case on the result of the truth of a BASIC expression. This is shown in context below, where two test expressions are combined using AND. IF X > 1 AND Y < 2 THEN CMP Word Ptr [X%],1 ;compare X% to 1 MOV AX,0 ;assume False JLE 003B ;we assumed correctly DEC AX ;wrong, decrement to -1 003B: CMP Word Ptr [Y%],2 ;now compare Y% to 2 MOV CX,0000 ;assume False JGE 0046 ;we assumed correctly DEC CX ;wrong, decrement to -1 0046: AND CX,AX ;combine the results AND CX,CX ;(this is redundant) JNZ 004F ;if not 0 assign Z% JMP 0055 ;else jump past END IF Z = 3 004F: MOV Word Ptr [Z%],3 ;assign Z% END IF 0055: ... ... The result of the first comparison is saved in the AX register as either 0 or -1, and the second is saved in CX using similar code. Once both tests have been performed and AX and CX are holding the appropriate values, the registers are then tested against each other using AND. The instruction AND CX,AX not only combines the results, but it also sets the CPU's Zero Flag to indicate if the result was zero or not. Therefore, the second test that uses AND to compare CX against itself to check for a zero result is redundant. At only 2 additional bytes, the impact on a program's size is not terribly significant. However, this shows first-hand the difference between code written by a compiler and code written by a person. OR conditions are handled similarly, except the assembly language OR instruction is used instead of AND. When multiple conditions are being tested using combinations of AND and OR and perhaps nested parentheses as well, additional similar code is employed. There are many situations where all that is really necessary is to test for a zero or non-zero condition. For example, it is common to use an integer variable as a True/False "flag" which can be set in one part of a program, and tested in another. By understanding the underlying code that BASIC creates, you can help BASIC to reduce the size of your programs enormously. In particular, avoiding a comparison with an explicit value lets BASIC generate fewer comparison instructions. The listing below shows how you can test multiple flags using AND, but with much less resulting code than using an explicit comparison. IF Flag1% AND Flag2% THEN MOV AX,[Flag2%] ;move Flag2% into AX AND AX,[Flag1%] ;AND that with Flag1% AND AX,AX ;(this is redundant) JNZ 0063 ;if not zero assign Z% JMP 0069 ;else skip past END IF Z% = 3 0063: MOV Word Ptr [Z%],3 END IF 0069: ... ... The key here is that zero is always used to represent False, and -1 to represent a True condition. That is, instead of writing IF Flag1% = -1 AND Flag2% = -1, using IF Flag1% AND Flag2% provides the same results. At only 20 bytes of generated code, this method is far superior to tests for an explicit -1 which require 37 bytes. If you recall, in Chapter 2 I showed how the various bits in a variable can be turned on or off with AND. Thus, 1111 AND 1111 equals 1111, while 1111 AND 0000 equals 0. Notice that using 0 and -1 has many other benefits as well. For example, the NOT operator which was also described in Chapter 2 can toggle a variable between those values. If all of the bits in a variable are presently zero, then NOT Variable% results in all ones (-1). This property can also be used to enhance a program's readability, by using NOT much like you would in an English sentence. For example, the code following the line IF NOT Flag% THEN will be executed if Flag% is 0 (False), but it will not be executed if Flag% is -1 (True). In fact, an explicit comparison is optional if you need to test only for a non-zero value. IF Variable <> 0 THEN can be reduced to IF Variable THEN, and the statements that follow will be executed as long as Variable is not 0. Notice that the only saving here is in the BASIC source, since either comparison creates ten bytes of assembler code. But when using long integers, the short form saves five bytes--14 bytes versus 19 for an explicit comparison to zero. NOT is equally valuable when toggling a flag variable between two values. If you have, say, an input routine that keeps track of the Insert key status, then you could use Insert% = NOT Insert% each time you detect that the Insert key was pressed. The first time the operator presses that Key, the Insert flag will be switched from the default start-up value of 0 to -1. Then using Insert% = NOT Insert% a second time will revert the bits back to all zeros. In fact, it is a common technique to define True and False variables (or constants) in a program using this: False% = 0 True% = NOT False% Most programmers understand how to use parentheses to force a particular order of evaluation. By default, BASIC performs multiplication and division before it does addition and substraction. When operators of the same precedence are being used, then BASIC simply works from left to right. However, the order in which logical comparisons are made is not always obvious. This can become particularly tricky if you are using some of the shorthand methods I described earlier. For example, consider the statements IF X AND Y > 12, IF NOT X OR Y, and IF X AND Y OR Z. In the first example, the truth of the expression Y > 12 is evaluated first, with a result of either 0 or -1. Then, that result is combined logically with the value of X using AND. The resulting order of evaluation is performed as if you had used IF X AND (Y > 12). The other expressions are evaluated as IF (NOT X) OR Y and IF (X AND Y) OR Z. The last logical operators we will consider are EQV and XOR. These are used rarely by most BASIC programmers, probably because they are not well understood. However, EQV can dramatically reduce the size of a program in certain circumstances. It is not uncommon to test if two conditions are the same, whether True or False. EQV stands for Equivalent, meaning it tests if the expressions are the same--either both true or both false. All three program fragments below serve the same purpose, however the first generates 57 bytes, while the second and third create only 16 bytes. IF (X = -1 AND Y = -1) OR (X = 0 AND Y = 0) THEN ... END IF IF X EQV Y THEN ... END IF IF NOT (X XOR Y) THEN ... END IF Although these examples could be replaced with a simple comparison that tests if X equals Y, EQV can reduce other, more elaborate AND and OR tests. For example, you could replace this: IF (X = 10 AND Y = 100) OR (X <> 10 AND Y <> 100) with this: IF X = 10 EQV Y = 100 and gain a handsome reduction in code size. Notice that because of the way EQV works, the third example in the listing above results in identical assembly language code as the second. XOR is true only when the two conditions are different, thus NOT XOR is true when they are the same. One final point worth mentioning is that you can assign a variable based on the truth of one or more expressions. As you saw earlier, every IF test that is used in a BASIC program adds a minimum of 3 extra bytes for a second, unconditional jump. That additional code can be avoided in many cases by assigning a variable based on whether a particular condition is true or not. In the code examples that follow, both program fragments do the same thing, except the first requires 25 bytes compared to only 14 for the second. IF Variable = 20 THEN Flag = -1 ELSE Flag = 0 END IF Flag = (Variable = 20) In either case, the truth of the expression Variable = 20 must be evaluated. However, the IF method adds code to jump around to different addresses that assign either -1 or 0 to Flag. The second example simply assigns Flag directly from the 0 or -1 result of the truth test. Other variants on this type of programming are statements such as A = (B = C), and Flag = (LEN(Temp$) <> 0 AND Variable < 50). Note that the surrounding parentheses are shown here for clarity only, and BASIC produces the same results without them. Short Circuits There is one important point regarding AND testing you should be aware of. Although the code that BASIC creates to implement these logical tests is very efficient, in some cases a different approach can yield even better results. When many conditions are tested, QuickBASIC creates assembly language code to evaluate all of them before making a decision. This can be wasteful, because often one of the conditions will be false, negating a need to test the remaining conditions. For example, this statement: IF Any$ = "Quit" AND IntVar% > 100 AND Float! <> 0 THEN PRINT "True" requires that all three conditions be tested before the program can proceed. But if Any$ is not equal to "Quit", there is no need reason to spend time evaluating the other tests. The solution is to instead use nested IF tests, preferably placing the most likely (or simplest) tests first, as shown below. IF Any$ = "Quit" THEN IF IntVar% > 100 THEN IF Float! <> 0 THEN PRINT "True" END IF END IF END IF Here, if the first test fails, no additional time is wasted testing the remaining conditions. Further, using the nested IF tests with QuickBASIC also results in less code: 50 bytes versus 64. Note, however, that BASIC PDS [and VB/DOS] incorporate a technique known as *short circuit expression evaluation*, which generates slightly more efficient code when AND is used. With the newer compilers, each condition is tested in sequence, and the first one that fails causes the program to skip over the code that prints "True". But even with this improved code generation, you should still place the most likely tests first. ON GOTO AND ON GOSUB STATEMENTS The last non-procedural control flow statements I will discuss here--ON GOTO and ON GOSUB--are used infrequently by many BASIC programmers. But when you need to test many different values *and* those values are sequential, ON GOTO and ON GOSUB can reduce substantially the amount of code that BASIC generates. For clarity, I will use ON GOTO for most of the examples that follow. Both work in a similar fashion except with ON GOSUB, execution resumes at the next BASIC statement when the subroutine returns. You have already seen that IF/ELSEIF and SELECT CASE blocks are not as efficient as they could be, because the compiler does not know how far ahead the END IF or END SELECT statements are located. Therefore, no matter how trivial the IF or CASE tests being performed are, a pair of jumps is always created even when a single jump would be sufficient. Further, when many tests are necessary, there is no avoiding at least some amount of code for each comparison. This is where ON GOTO can help. Rather than perform a series of separate tests for each value being compared, ON GOTO uses a lookup table which is imbedded in the code segment. This table is merely a list of addresses to branch to, based on the value of the variable or expression being evaluated. If the value being tested is 1, then a branch is taken to the first label in the list. If it is 2, the code at the second label is executed, and so forth. As many as 60 labels can be listed in an ON GOTO statement, although the number being tested can range from 0 to 255. If the value is 0 or higher than the number of items in the list, the ON GOTO command is ignored, and execution resumes with the statement following the ON GOTO. Negative values or values higher than 255 cause an "Illegal function call" error. A simple example showing the basic syntax for ON GOTO is shown below. INPUT "Enter a value between 1 and 3: ", X ON X GOTO Label1, Label2, Label3 PRINT "Illegal entry!" END Label1: PRINT "You pressed 1" END Label2: PRINT "You pressed 2" END Label3: PRINT "You pressed 3" END Notice that the more labels there are, the bigger the savings in code size. ON GOTO adds a fixed overhead of 70 bytes, 61 of which is the size of the library routine that evaluates the value and actually jumps to the code at the appropriate label. The remaining 9 bytes are needed to load the value being tested and pass that on to the ON GOTO routine. However, for each label in the list, only 2 bytes are required in the lookup table to hold the address. Compare that to SELECT CASE which requires 6 bytes of set-up code (when an integer is being tested), and 13 bytes more to process each CASE. Thus, the crossover point at which ON GOTO is more efficient is when there are 6 or more comparisons. Notice that if ON GOTO is used in more than one place in a program, the savings are even greater because the 61-byte library routine is added only once. Again, ON GOTO has the important restriction that all of the values must be sequential. However, this limitation can also be turned into a feature by taking advantage of the inherent efficiency of lookup tables. Using a lookup table is a very powerful technique, because you can determine a result using an index rather than actually calculating the answer. A lookup table is commonly used to determine log and factorial functions, since those calculations are particularly tedious and time consuming. With a lookup table you would calculate all of the values once ahead of time, and fill an array with the answers. Then, to determine the factorial for, say, the number 14, you would simply read the answer from the fourteenth element in the array. You can apply this same technique in BASIC using a combination of INSTR and ON GOTO or ON GOSUB. Although INSTR is intended to find the position of one string within another, it is also ideal for looking up characters in a table. Imagine you have written an input routine that must handle a number of different keys, and branch according to which one was pressed. One way would be to use an IF/ELSEIF or SELECT CASE block, with one section devoted to each possible key. But as you saw earlier, once there are more than 5 keys to be recognized, either of those constructs are less efficient than ON GOTO. The approach I often use is to combine INSTR and ON GOSUB to branch according to which function key was pressed. The beauty of this method is that a value of zero (or one that is out of range) causes control to fall through to the next statement. Therefore any keys that are not explicitly being tested for are simply ignored. This is shown in context below. DO DO 'wait for a key press K$ = INKEY$ Length% = LEN(K$) LOOP UNTIL Length% IF Length% = 2 THEN 'it's an extended key Code$ = RIGHT$(K$, 1) 'isolate the key code and branch accordingly ON INSTR(";<=>?@ABCD", Code$) GOSUB ... END IF LOOP UNTIL K$ = CHR$(27) 'until they press Esc Here, extended keys are identified by a length of 2, and the key code is then isolated with RIGHT$. The punctuation and letters within the quotes are characters 59 through 68, which correspond to the extended codes for F1 through F10. (A list of all the extended key codes is in your BASIC owner's manual.) Of course, any arbitrary list of key codes could be used. Further, the key codes do not need to be contiguous. For example, to branch on the Up arrow, Down arrow, Ins, Del, PgUp, and PgDn keys you would use "HPRSIQ" as the source string. Any other mix of characters could also be used, including Alt keys. Another interesting and clever trick that combines INSTR and ON GOTO lets you test multiple keys regardless of capitalization. The short program below accepts a character, and uses INSTR to look it up in a table of upper and lower case character pairs. PRINT "Yes/No/Load/Save/Retry/Quit? "; DO K$ = INKEY$ LOOP UNTIL LEN(K$) = 1 ON (INSTR("YyNnLlSsRrQq", K$) + 1) \ 2 GOTO ... After adding 1 and dividing that by 2, the result will indicate in which character pair the choice was found. This technique could also be extended to include 3- or 4-character groups, or any other combination of characters. Since any value between 0 and 255 is legal for an ASCII character, INSTR can be used in other, more general lookup situations as well. A COMPARISON OF SUBROUTINE METHODS ================================== There are four primary subroutine types that BASIC supports: GOSUB subroutines, DEF FN functions, called subprograms, and what I refer to as "formal functions". Each has its own advantages and disadvantages, which I will describe momentarily. But I would first like to introduce several terms that will be used throughout the discussion that follows. The first is *module*, which is a series of BASIC program statements kept in their own separate source file. All modules have a main portion, and some also have procedures within a SUB or FUNCTION block. The main portion of a program is that which receives control when the program is first run. When a program is comprised of multiple modules, each additional module has a main portion, although code within that portion is rarely executed. In fact, there are only two ways to access code in the main portion of an ancillary module: One is to create a line label and use that as the target for ON ERROR or another "ON" event. The other is to define a DEF FN function and invoke the function. The second term is *variable scope*, which indicates where in a program a variable may be accessed. Variables that are used in the main portion of a program are accessible anywhere else in the main, but not within a SUB or FUNCTION block. Likewise, a variable that is defined within a SUB or FUNCTION is by default private to that procedure. The overwhelming advantage of private variables is that you do not have to worry about errors caused by inadvertently using the same variable name twice. The third term is *SHARED*, and it overrides the default private scope of a variable used in a procedure. SHARED may be used in either of two ways. If it is specified with a DIM statement in the main body of a program--that is, DIM SHARED Variable--the variable is established as being shared throughout the entire source file. Even though DIM is usually associated with arrays, it can be used this way to extend a variable's scope. SHARED may also be used within a subroutine to share one or more variables with the main portion. Notice that the statement SHARED Variable inside a procedure defines the variable as being shared with the main portion of the program only. SHARED used within a procedure does not share the named variable with any other procedures. The only exception is when other procedures also use SHARED with the same variable name. In that case they are shared between procedures, as well as with the main program. ╔═════════════════════════════╗ ║ DEFINT A-Z ║ ║ DIM SHARED Var1 ║ ║ ║ ┌──╫──>Var1 = 100 ║ ┌──│──╫──>Var2 = 200 ║ │ │ ║ CALL Sub1(Var2) ║ │ │ ║ CALL Sub2(Var2) ║ │ │ ║ END ║ │ │ ║ ║ │ │ ║ SUB Sub1 (Param) STATIC ║ │ ├──╫────>Var1 = Param ║ │ │ ║ Var2 = Var1 ║ │ │ ║ END SUB ║ │ │ ║ ║ │ │ ║ SUB Sub2 (Param) STATIC ║ │ │ ║ SHARED Var2 ║ │ └──╫────>Var1 = Param ║ └─────╫────>Var2 = Var1 ║ ║ END SUB ║ ╚═════════════════════════════╝ Figure 3-1: How SHARED and DIM SHARED affect variable scope. Variables that share the same identity are shown connected. The fourth term is *COMMON*, which is related to SHARED in that it also lets you share variables among procedures. However, COMMON has the additional property of allowing variables to be shared by procedures that are not in the same physical source file. When BC compiles your program, it translates your variable names to memory addresses. Thus, those names are not available when the program is linked to other object files. Variables that are listed in a COMMON statement are placed in a separate portion of the data segment which is reserved just for that purpose. Therefore, other program modules using COMMON can also access those variables in that portion of DGROUP. MODULE1.BAS ╔═════════════════════════════╗ ║ DEFINT A-Z ║ ║ COMMON SHARED Var1 ║ ║ ║ ┌─────╫──>Var1 = 100 ║ │ ┌──╫──>Var2 = 200 ║ │ │ ║ CALL Sub1(Var2) ║ │ │ ║ CALL Sub2(Var2) ║ │ │ ║ END ║ │ │ ║ ║ │ │ ║ SUB Sub1 (Param) STATIC ║ ├──│──╫────>Var1 = Param ║ │ │ ║ Var2 = Var1 ║ │ │ ║ END SUB ║ │ │ ║ ║ │ │ ║ SUB Sub2 (Param) STATIC ║ │ │ ║ SHARED Var2 ║ ├──│──╫────>Var1 = Param ║ │ └──╫────>Var2 = Var1 ║ │ ║ END SUB ║ │ ╚═════════════════════════════╝ │ │ MODULE2.BAS │ ╔═════════════════════════════╗ │ ║ DEFINT A-Z ║ │ ║ COMMON Var1 ║ │ ║ ║ └─────╫──>Var1 = 100 ║ ┌──╫──>Var2 = 200 ║ │ ║ CALL Sub1(Var2) ║ │ ║ CALL Sub2(Var2) ║ │ ║ END ║ │ ║ ║ │ ║ SUB Sub1 (Param) STATIC ║ │ ║ Var1 = Param ║ │ ║ Var2 = Var1 ║ │ ║ END SUB ║ │ ║ ║ │ ║ SUB Sub2 (Param) STATIC ║ │ ║ SHARED Var2 ║ │ ║ Var1 = Param ║ └──╫────>Var2 = Var1 ║ ║ END SUB ║ ╚═════════════════════════════╝ Figure 3-2: How COMMON and COMMON SHARED affect variable scope. Variables that share the same identity are shown connected. COMMON can also be combined with SHARED, to specify that one or more variables be shared throughout the main program as well as with other modules. That is, the statement COMMON SHARED Variable tells BASIC that Variable is to be both DIM SHARED and COMMON. To establish a TYPE variable as COMMON, you must state the type name as well: COMMON TypeVar AS MyType. In all cases, COMMON statements must precede the executable statements in a program. The only statements that may appear before COMMON are other non-executable statements such as DECLARE, CONST, and '$STATIC. Because the variable names listed in a COMMON statement are not stored in the final program, the names used in one module do not need to be the same as the corresponding names in another module. You could, for example, have COMMON X%, Y$, Z# in one file, and COMMON A%, B$, C# in another. Here, X% refers to the same memory location as A%; Y$ is the same variable as B$, and so forth. It is imperative, however, that the order and type of variables match. If one file has an integer followed by a string followed by a double precision variable, then all other files containing a COMMON statement must have their COMMON variables in that same order. This is one good reason for storing all COMMON statements in a single include file, which is included by each module that needs access to the COMMON variables. One or more arrays may also be listed as COMMON; however, the rules are different for static and dynamic arrays. When a dynamic array is to made COMMON, it should be dimensioned in the main program only, following the COMMON statement. (But you may use REDIM in another module if necessary, to change the array's size.) Static arrays must be dimensioned in each module, before the associated COMMON declaration. Of course, all array types must match across modules--you may not list a static array as the first COMMON item in one file, and then list a dynamic array in that same position in another file. There are actually two forms of COMMON statement: the blank COMMON and the named COMMON. The examples shown thus far are blank COMMON statements. A named COMMON block lets you specify selected variable groups as COMMON, to avoid having to list many variables when all of them are not needed in a given module. A COMMON block is named by preceding the variable list with a name surrounded by slash characters. For instance, this line: COMMON /IntVars/ X%, Y%, Z% establishes a named COMMON black called IntVars. By creating several such named blocks you may share only those that are actually needed in a given module. In this case, the block name is stored in the object file, and LINK ensures that the COMMON variables in each module share the same addresses. One important limitation of a named COMMON block is that it cannot be used to pass information between programs that use CHAIN. The fifth term is *STATIC*, which I described in a slightly different context in the section about data in Chapter 2. When you add the STATIC option to a SUB or FUNCTION definition, BASIC treats the variables within that procedure very differently than when STATIC is omitted. With STATIC, memory in DGROUP is allocated by the compiler for each variable, and that memory is permanently reserved for use by those variables. When STATIC is not specified, the variables in the routine are by default placed onto the system stack. This means that sufficient stack memory must be available, although that memory can then be used again later for variables in other procedures. An important side effect of using the stack for variable storage is that the memory is cleared each time the subprogram or function is entered. Therefore, all numeric variables are initialized to zero, and strings are initialized to null. Any arrays within a non-static procedure are by default dynamic, which means they are created upon entry to the routine and erased when the routine exits. STATIC also has an additional meaning in subprograms and functions; it can establish variables as being private to a procedure. If a variable has been declared as shared throughout a module by using DIM SHARED in the main portion of the program, using the statement STATIC Variable inside the subroutine will override that property. Thus, Variable will be local to the procedure, and will not conflict with a global shared variable of the same name. STATIC within a subprogram or function also lets you use the same name for a variable that was already given to a named constant. Many programmers find the use of the term STATIC for two very different purposes confusing, and rightly so. It would have made more sense to use a different keyword, perhaps LOCAL, to limit a variable's scope. And to further confuse the issue, the '$STATIC metacommand is used to establish the memory storage method for arrays. None the less, STATIC always indicates that memory for a variable is permanently allocated, and it may also specify that a variable is private to a procedure. The final term I want to introduce now is *recursion*. The classic definition of a recursive procedure is that it may call itself. While this is certainly true, that doesn't really explain what recursion is all about, or how it could be useful. I will cover recursion in depth momentarily, but for now suffice it to say that recursion is often helpful when manipulating tree-structured information. For example, a program that lists all of the files on a hard disk would most likely be based on a recursive subroutine. Such a program would first change to the root directory, and then call the routine to read and display all of the file names it finds there. Then for each directory under the current one, the routine would change to that directory and call itself again to read and display the files in that directory. And if more directories were found at the next level down, the routine would call itself yet again to process all of those files too. This continues until all of the files in all directories on the hard disk have been processed. Another application for recursion is a subroutine that sorts an array on more than one key. For example, consider a TYPE array in which each element has components for a first name, a last name, and address fields. You might want to be able to sort that array first by last name, then by first name, and then by zip code. That is, all of the Smiths would be grouped together, and within that group Adam would be listed before John. All of the John Smiths would in turn be sorted in zip code order. By employing recursion, the routine would first sort the entire array based on the last name only. Next, it would identify each range of elements that contain identical last names. The routine would then call itself to sort that subgroup, and call itself again to sort the subgroup within that group based on zip code. SUBROUTINES VERSUS FUNCTIONS There is a fundamental difference between subroutines and functions. A subroutine is accessed with either a CALL or GOSUB statement, and a function is invoked by referencing its name. In general, a subroutine is used to perform an action such as opening a group of files, or perhaps updating a screen-full of information. A function, on the other hand, returns a value such as the result of a calculation. A string function also returns information, although in this case that information is a string. Notice that the type of information returned by a function is independent of the type of parameters, if any, that are passed to it. For example, BASIC's native STR$ function accepts a numeric argument but returns a string. Likewise, a numeric function such as INSTR accepts two strings and returns a single integer. This is also true for functions that you design using either DEF FN or FUNCTION. Although a function is primarily used for calculations and a subroutine for performing one or more actions, there is no hard and fast distinction between the two. You could easily design a subroutine that multiplies three numbers and returns the answer in one of the parameters. Similarly, a function could be written to clear the screen and then open a file. Which you use and when will depend on your own programming style. However, there are definite advantages to using functions where appropriate. One immediately obvious benefit of a function is that a value can be returned without requiring an additional passed parameter. Each variable that is passed as a parameter requires 4 bytes of code for setup, plus an additional 5 bytes within the subroutine each time it is accessed. Another important advantage of using a function is BASIC's automatic type conversion. If you assign a single precision variable from the result of an integer function, BASIC will convert the data from one format to the other transparently. In fact, a simple assignment from a variable of one type to that of another type is also handled for you by the compiler. But if a routine is written to pass the value back as a parameter, then you must use whatever type of data the subprogram expects. Although most high-level languages require the programmer to match explicitly the types of data being assigned, Microsoft BASIC has done this automatically since its inception. When you write Var1! = Var2%, BASIC treats that as Var1! = CSNG(Var2%). Object oriented programming languages use the term *polymorphism* to describe such automatic type conversion. GOSUB ROUTINES The primary advantage a GOSUB routine holds over all of the other subroutine types is that it can be accessed very quickly. Translated to assembly language a GOSUB statement is but three bytes in length, and its speed is surpassed only by a GOTO. When the only thing that matters is how fast a subroutine can be called, GOSUB has the clear advantage. However, there are many limitations inherent in a GOSUB. The most important restriction is that arguments cannot be passed using GOSUB. Therefore, any variables must be assigned before invoking the routine, and possibly reassigned when it returns. For example, if a subroutine requires two parameters--perhaps a row and column at which to print a message--those variables must be assigned before the GOSUB can be used. And if a value is being returned, your program must know the name of the variable that was assigned within the GOSUB routine. Another important limitation is that the target line label must be in the same block of code as the GOSUB. Although a GOSUB is legal within a SUB or FUNCTION, both the GOSUB and the routine it calls must be located in the same procedure. Likewise, a GOSUB in the main body of a program cannot access a subroutine inside a procedure, or vice versa. [And of course you cannot invoke a GOSUB routine that is located in a different source module.] Both of these problems restrict your ability to reuse a subroutine in more than one program. One of the goals of modern structured programming is the ability to design a routine for one application, and also use it again later in other programs. The only way to do that using GOSUB routines is to establish a variable naming convention, and always use variables and line labels with those unique names. SUBPROGRAMS Subprograms were introduced with QuickBASIC version 2.0, and they improve greatly on GOSUB routines in many respects. The most important advantages of a subprogram are that it accepts passed parameters, and that variables used within the subprogram are local by default. Besides the obvious benefit of not having to worry about variable naming conflicts, these properties allow you to create your own toolbox of useful subroutines, and use them repeatedly in different programming projects. I will discuss this use of subprograms in detail later in this chapter. A subprogram is accessed using the CALL statement, and any number of arguments may optionally be passed to the routine. A subprogram is defined with a statement of the form SUB SubName (Param1, Param2, ...) STATIC. The parameters and surrounding parentheses are optional, as is the STATIC directive. Of course, the number of arguments passed to a subprogram must match the number of parameters it expects. As you can see, subprograms have many advantages over GOSUB routines. However, they are not a magical panacea for every programming problem. Each subprogram includes a fixed amount of overhead just to enter and exit it. Because of the complexities of accessing incoming parameters, a *stack frame* must be created by the compiler upon entry. A stack frame is simply a fancy name for an area of memory that holds the addresses of the incoming parameter. However, this requirement adds a fair amount of code to each subprogram. Eight bytes of code are needed to set up and call the internal BASIC routine that creates the stack frame, and the routine itself comprises another 35 bytes. Eight more bytes are needed to call the routine that exits a subprogram, and that routine adds contains 26 bytes. Finally, all but the last subprogram in a source file needs a 3-byte jump to skip over the other subprograms that follow. Therefore, a total of 80 bytes are added to any program that uses a subprogram rather than a GOSUB routine. It is important to point out, however, that the 61 bytes used by the library routines to enter and exit a subprogram are added to the final .EXE file only once. It is also worth mentioning that BASIC PDS provides the /Ot switch, which eliminates the usual overhead incurred from calling the routines needed to enter and exit a subprogram. Although using /Ot avoids the code that is otherwise added, there is one important restriction: You may not use a GOSUB within the subprogram. When a program performs a GOSUB, the address to return to is placed onto the stack, for retrieval later when the subroutine returns. Likewise, when a subprogram is called, both a segment and address to return to are put on the stack. If a GOSUB were used inside the subprogram and an EXIT SUB was then encountered within the GOSUBed subroutine, the return addresses on the stack would be out of order. Thus, the subprogram would return to the wrong place, with undoubtedly disastrous consequences. To avoid this, BASIC by default saves the address to return to when the subprogram is first entered, and uses that when it is exited. Therefore, when the compiler sees that a GOSUB is being used, it does not use the abbreviated method even if /Ot has been specified. Although using /Ot makes a subprogram (and function) much faster by eliminating the overhead to call the entry and exit routines, there is no actual savings in code size. A series of assembler NOP (No Operation) instructions are placed where the entry and exit code would have been. However, those empty instructions are never executed. We can only hope that in future releases of BASIC PDS Microsoft will improve BC's code generation to eliminate these unnecessary instructions. [Yeah, right.] Another problem with subprograms is that programmers tend to use them to excess. For example, I have seen people create subprograms to increment and decrement integer variables even though it is far more efficient to do that with in-line code. The statement X% = X% + 1 creates only 4 bytes of code, compared to 9 for a single call to a subprogram to do the same thing! However, incrementing long integer or floating point variables does take more code than invoking a subprogram with a single parameter, so a subprogram could be useful in that case. Only by counting the number of times a subprogram will be used and comparing that to the overhead incurred can you determine whether there will be any savings. DEF FN FUNCTIONS Although a DEF FN function is designed to return a result, it is more closely related to a GOSUB subroutine in actual operation. Like a GOSUB routine it is invoked with a 3-byte assembly language "near" call, as opposed to the 5-byte "far" call that subprograms and formal functions require. And while a DEF FN function can accept incoming parameters, variables within the function definition are by default shared with the main portion of the program. As I already explained, variables used in a DEF FN function can be made private to the function only by explicitly declaring them as STATIC. However, at least it is possible to employ local variables. Further, a DEF FN function can return a result, which makes it an ideal replacement for GOSUB when speed is paramount. Internally, parameters are passed to a DEF FN function very differently than to a called subprogram or formal function. Arguments are passed to a subprogram by placing their addresses on the stack. With a DEF FN function, however, a copy of each parameter is created, and the function directly manipulates those copies. Therefore, it is impossible for a DEF FN function to modify an incoming parameter directly. This behavior is neither good nor bad. Rather, it is simply different and thus important to understand. It is also important to understand that a DEF FN function can be used only in the module in which it is defined. If the same function is needed in different modules, the same code must be duplicated again and again. In the manuals that come with QuickBASIC and BASIC PDS, Microsoft advises against using DEF FN functions, in favor of the newer, more powerful formal functions. Because of this favoritism, Microsoft will probably never correct one disturbing anomaly that is present in all DEF FN functions. When a string is passed as an argument to a DEF FN function, a copy is made for the function to manipulate. Unfortunately, the copy is never deleted! Therefore, if you pass, say, a 10,000 byte string to a DEF FN function, that amount of memory is permanently taken until the function is invoked again later. The short listing below proves this behavior. DEF FnWaste (A$) FnWaste = ASC(A$) END DEF Big$ = SPACE$(10000) PRINT FRE(Big$) X = FnWaste(Big$) PRINT FRE(Big$) Notice that running this program in the QuickBASIC editing environment will not give the expected (memory-wasting) result. However, in a separately compiled program the 10000 byte loss will be evident. As with subprograms, there is a fixed amount of overhead required to enter and exit a DEF FN function. For each function that has been defined, 5 bytes are needed to call the Enter and Exit routines. Further, these routines are 14 and 24 bytes in length respectively. But again, the routines themselves are added to a program only once when it is linked. There are two final limitations of DEF FN functions worth mentioning here. The first is that arrays and TYPE variables may not be passed as parameters to them. Since by design a copy is made of every incoming parameter, there is no reasonable way to do that with an entire array. The second limitation is that the function definition must be physically positioned in the source file before any references are made to it. FORMAL FUNCTIONS A formal function is nearly identical to a called subprogram, and it requires the exact same amount of overhead to enter and exit. Also like subprograms, nearly any type of data may be passed to a function, including TYPE variables and arrays. The only limitation is that a fixed-length string may not be used directly as a parameter. If a fixed-length string is passed to a subprogram or function that expects a string, a copy is made and assigned to a conventional string. This copying was described in detail in Chapter 2. Because a formal function is invoked by referencing its name in an assignment or PRINT statement, it is essential that it be declared. After all, how else could BASIC know that the statement PRINT MyFunc means to call a function and display the result, as opposed to printing the variable named MyFunc? When a BASIC function is created in the BASIC editing environment, a corresponding DECLARE statement is generated automatically. But when a function is written in another language or kept in a Quick Library, an explicit declaration is mandatory. Like subprograms, formal functions are ideally suited to modular, reusable programming methods. Furthermore, a function may be accessed from any module in an entire application, even those in other source files. Indeed, the only difference between a subprogram and a function is that a function returns a result. The assembly language code that BASIC generates is in all other respects identical. STATIC VERSUS NON-STATIC PROCEDURES As I stated earlier, when the STATIC keyword is appended to a SUB or FUNCTION declaration, all of the variables within the routine are assigned a permanent address in DGROUP. And when STATIC is omitted, the variables are instead stored on the stack and cleared to zeros or null strings each time the routine is entered. There are several important ramifications of this behavior. Non-static procedures allocate new stack memory each time they are invoked, and then release that memory when they exit. It is therefore possible to exhaust the available stack space when the subroutine calls are deeply nested. For example, if you call one subprogram that then calls another which in turns calls yet another, sufficient stack memory must be available for all of the variables in all of the subprograms. Besides the memory needed for each variable in a subprogram or function, other data is also placed onto the stack as part of the call. For each parameter that is passed, 2 bytes are taken to hold its address. Add to that 4 bytes to store the segment and address to return to in the calling program. Finally, temporary variables that BASIC creates for its own purposes are also stored on the stack in a non-static subprogram or function. Another important consideration when STATIC is omitted is that every string variable must be deleted before the subprogram exits. Because of the way BASIC's string management routines operate, memory that holds string descriptors and string data cannot simply be abandoned. Every string must be released explicitly by a called routine, at a cost of 9 bytes per string. Please understand that you do not have to delete these strings. Rather, this is another case where BASIC creates additional code without telling you. Again, I would love to be able to tell you that using STATIC is always desirable, or that never using it always makes sense. But unfortunately, it just isn't that simple. When a program becomes very large and complex, only by counting variables can you be absolutely certain how much stack space is really needed. Although the FRE(-2) function may be used to determine how much stack memory is currently available, it does not tell how much memory is actually needed by each routine. To summarize the trade-offs between static and non-static variables: Static variables are allocated permanently by the compiler, and the memory they occupy can never be used for any other purpose. Non-static variables are placed onto the stack, and exist only while the subprogram or function is in use. Remember that you can also have a mix of static and non-static variables in the same procedure. By omitting STATIC after the subroutine name, all variables will by default be non-static. You can then override that property for selected variables by using the STATIC keyword. In the section on debugging in Chapter 4, you will learn how to use CodeView to determine the stack requirements for a procedure's variables. Controlling the Stack Size There are several ways to control the amount of memory that is dedicated for use by the stack. All versions of BASIC support the CLEAR command, which takes an optional argument that sets the stack size. The statement CLEAR , , StackSize sets aside StackSize bytes for the stack. Unfortunately, CLEAR also clears all of the data in a program, closes any open files, and erases all arrays. If you know ahead of time how much stack memory will be needed, then using CLEAR as the first statement in a program will not cause a problem. Even when CLEAR is used as the first statement in a program, there is still one situation where that will not be acceptable. When you use CHAIN to execute a subsequent program, a CLEAR statement in that program will clear all of the variables that have been declared COMMON. Fortunately, there are two solutions to this problem: BASIC PDS offers the STACK statement, which lets you establish the size of the stack but without the side effects of CLEAR. For example, the statement STACK 5000 sets aside 5000 bytes for the stack. The other solution is to use the /STACK: link switch, which reserves a specified number of bytes. All of the options that LINK supports are described in Chapter 5. RECURSION I have already illustrated some of the situations in which a recursive subprogram or function could be useful. Now lets look at some actual programming examples. The Evaluate function in the listing below uses recursion to reinvoke itself for each new level of parentheses it encounters. DECLARE FUNCTION Evaluate# (Formula$) INPUT "Enter an expression: ", Expr$ PRINT "That evaluates to"; Evaluate#(Expr$) FUNCTION Evaluate# (Formula$) 'Search for an operator using INSTR as a table lookup. If found, 'remember which one and its position in the string. FOR Position% = 1 TO LEN(Formula$) Operation% = INSTR("+-*/", MID$(Formula$, Position%, 1)) IF Operation% THEN EXIT FOR NEXT 'Get the value of the left part, and a tentative value for the 'right part. LeftVal# = VAL(Formula$) RightVal# = VAL(MID$(Formula$, Position% + 1)) 'See if there's another level to evaluate. Paren% = INSTR(Position%, Formula$, "(") 'There is, call ourselves for a new RightVal#. IF Paren% THEN RightVal# = Evaluate#(MID$(Formula$, Paren% + 1)) 'No more to evaluate, do the appropriate operation and exit. SELECT CASE Operation% CASE 1 'addition Evaluate# = LeftVal# + RightVal# CASE 2 'subtraction Evaluate# = LeftVal# - RightVal# CASE 3 'multiplication Evaluate# = LeftVal# * RightVal# CASE 4 'division Evaluate# = LeftVal# / RightVal# END SELECT END FUNCTION When you run this program, enter an expression like 15 * (12 + (100 / 8)). To keep the code to a minimum, Evaluate accepts only simple, two-number expressions. That is, it will not work with more than one math operator within each pair of parentheses as in 10 * (3 + 4 + 5). However, the parentheses may be nested to nearly any level. This function begins by examining each character in the incoming formula string for a math operator. If it finds one the operator number (1 through 4) is remembered, as well as its position in the formula string. Next, VAL is used to obtain the value of the digits to the left of the operator, as well as the digits to the right. Notice that it was not necessary to use LEFT$ to isolate the left-most portion of the string, because VAL stops examining the string when it encounters any non-digit character such as the "+" or "(". Once these values have been saved, the next test determines if any more parentheses follow in the formula. If so, Evaluate calls itself, passing only those characters that are beyond the next parenthesis. Thus, the same routine evaluates each new level, returning to the level above only after all levels have been examined. I encourage you to run this program in the QuickBASIC editing environment, and step through each statement one by one with the F8 Trace command. In particular, use the Watch Variable feature to view the value of Position% and LeftVal# as the function recurses into subsequent invocations. It is important to understand the need for stack variables in this program, and why STATIC must not be used in the function definition. When Evaluate walks through the incoming string and determines which math operator is specified, that operator must be remembered throughout the course of the function. If a static variable were used for Operation%, then its previous value would be destroyed when Evaluate calls itself. Likewise, LeftVal# cannot be overwritten either, or it would not hold the correct value when Evaluate returns to itself from the level below. Therefore, as you step through this program you will observe that each new invocation of Evaluate creates a new set of variables. As you can see, stack variables are necessary for the proper functioning of a subprogram or function that calls itself. They are also necessary when one procedure calls another procedure which in turn calls the first one again. The key point is that each time a non-static routine is invoked, new and unique variables must be created. Otherwise, the variable contents from a previous level above will be overwritten. Although recursion is a powerful and necessary technique, it should be used only when necessary. There is a substantial amount of overhead needed to allocate stack memory and clear it to zeros, so invoking a non-static routine is relatively slow. And as I described earlier, every non-static string variable must be deleted when the routine exits, at a cost of 9 bytes apiece. Some programmers use recursion even when there are other, more efficient ways to solve a problem. For example, the QuickBASIC manual shows a recursive function that calculates a factorial. (A factorial is derived by multiplying a number by all of the whole numbers less than itself. That is, the factorial of 4 equals 4 * 3 * 2 * 1.) However, a factorial can be calculated faster and with less code using a simple FOR/NEXT loop as shown below. This version of Factorial is 20 percent faster than the example given in the QuickBASIC manual. FUNCTION Factorial#(Number%) STATIC Seed# = 1 FOR X% = 1 TO Number% Seed# = Seed# * X% NEXT Factorial# = Seed# END FUNCTION PASSING PARAMETERS TO PROCEDURES As you have already learned, BASIC normally passes data to a subprogram or function by placing its address on the stack. And when an entire array is specified, the address of the array descriptor is sent instead. But there are some cases where BASIC imposes restrictions on how variables and arrays may be passed to a procedure. Let's look now at some of the ways to get around those restrictions. When using versions of BASIC earlier than PDS 7.1, it is not legal to pass an array of fixed-length strings. In fact, it is also impossible to pass a single fixed-length string directly. As you saw in Chapter 2, BASIC copies every fixed-length string argument to a regular string, which adds a lot of code and also wastes string memory. The simplest solution for fixed-length strings is to define an equivalent TYPE that is comprised of a single string component. Since a TYPE variable or array can legally be passed, this is the easiest and most direct approach, as shown here. TYPE FLen S AS STRING * 100 END TYPE DIM MyString AS Flen CALL Subprogram(MyString) SUB Subprogram(FLString AS FLen) ... ... END SUB If the subprogram being called is in a separate module, then the TYPE definition must also be present in that file. However, the DIM statement is needed only in the program that passes the string. This also works with fixed-length string arrays, except that the DIM would have to be changed to DIM MyArray(1 TO NumElements) AS FLen, and the subprogram's definition would be changed to SUB Subprogram(FLString() AS FLen). BASIC PDS 7.1 supports passing a fixed-length string array directly, so this work-around is not needed with that version. Curiously, a single fixed-length string may not be passed as a parameter in BASIC 7.1. Since a fixed-length string is closely related to a TYPE variable, this limitation seems arbitrary at best. BASIC 7.1 also supports the use of BYVAL when passing numeric arguments to procedures. This is a particularly powerful feature, because it can greatly reduce the amount of code needed to access those values within the routine. It also eliminates the need to make copies when a constant is passed as an argument. To take advantage of this feature, you simply specify BYVAL in both the calling and receiving argument list, as shown below. DECLARE SUB Subroutine(BYVAL Arg1%, BYVAL Arg2%) CALL Subroutine(Var1%, Var2%) SUB Subroutine(BYVAL X%, BYVAL Y%) ... ... END SUB Because the actual value of the argument is being passed, there is no way to return information back to the caller. But in those situations where an assignment to the original variable from within the routine is not needed, BYVAL can eliminate a lot of compiler-generated code when dealing with integers. Of course, you may use a mix of BYVAL and non-BYVAL parameters if you need the benefits of both methods in a single call. As proof of this savings, disassemblies of a one-statement subprogram designed both ways is presented below, to show how an integer parameter is accessed when it is passed by address and by value. SUB ByAddress(Param%) STATIC LocVar% = Param% MOV SI,[Param%] ;get the address of Param% MOV AX,[SI] ;then read the value there MOV LocVar%,AX ;assign that to LocVar% END SUB SUB ByValue(BYVAL Param%) STATIC LocVar% = Param% MOV AX,Param% ;read Param% directly MOV LocVar%,AX ;and assign it to LocVar% END SUB Note that the savings are only within the subroutine, and not when it is called. That is, 4 bytes are needed to pass an integer variable whether by address or by value. In fact, passing larger data types requires more code to pass by value. Any variable can be passed by address with 4 bytes of compiler-generated code, because what is sent is a single address. But to pass a double precision number by value requires 16 bytes, since 4 bytes of code are needed for each 2-byte portion of the number. In general, passing variables as parameters to a subprogram or function is preferable to sharing them. When many variables are shared throughout a program, you run the risk of introducing bugs caused by accidentally using the same variable name more than once. However, sharing has some definite advantages in at least two situations. The first is when a procedure must be accessed as quickly as possible. Since a finite amount of code is needed to pass each parameter, some amount of time is also required to execute that code. Therefore, sharing a few, carefully selected variables can improve the speed of your programs and reduce their size as well. Another important use for SHARED is to conserve data memory. Nearly all programs use at least a few temporary scratch variables, perhaps as FOR/NEXT loop counters. By dimensioning several such variables as being shared throughout a program, the same variables can be used repeatedly. I often begin programs with a DIM SHARED statement such as DIM SHARED X, Y, Z, and then use those variables as often as possible. One final trick I want to share is how to pass a large number of parameters using less code than would normally be necessary. Each argument that is passed to a procedure requires 4 bytes of code. In a complicated routine that needs many parameters, this can quickly add up. Worse, these bytes are added for every call. Therefore, a subprogram that accepts 10 parameters and is called 20 times will add 800 bytes to the final executable file just to handle the parameters! One solution is to use an array, which is ideal when all of the parameters are the same type of data. An entire array can be passed as a single parameter since only the array descriptor's address is needed. Even better, however, is to create a TYPE variable, and then assign all of the parameters to it. A TYPE variable can hold nearly any amount and type of data, and it too can be passed using only 4 bytes. Although this does require a separate assignment for each TYPE component, you simply use the TYPE where the regular variables would have been assigned. By eliminating the added code to pass many parameters, programs that use a TYPE this way will also be much faster. MODULAR PROGRAMMING QuickBASIC versions 4.0 and later let you load subprograms and functions from multiple files into the editing environment at the same time. This further enhances their reusability, since the different modules can be treated as "black boxes" whose purpose is already known. Once a routine has been developed and debugged, it can be used again and again, without further regard for the names of the variables within the routines. Indeed, many of the utility routines included with this book are provided as separate modules, intended to be loaded along with your programs. Any variable name can be passed as an argument to a procedure, even if a different name is used to represent the same variable within the procedure. If you have defined a subprogram such as SUB MySub(X%, Y!, Z$), then you could call it using CALL MySub(A%, B!, C$). Of course, the variables you pass must be of the same data type as the subroutine expects. Because reusability is an important consideration in the design of any procedure, it generally makes sense to store it in its own source file. This lets you combine the same module repeatedly with any number of programs. The alternative would be to merge the file into each program that needs it. But maintaining multiple copies of the same code wastes disk space. Further, if a bug is found in the routine, you will have to identify all of the programs that contain it, and manually correct each one of them. Another important advantage of using separate files is that you can exceed the usual 64K code size barrier. Unlike the data segment which is comprised of the sum of all data in all modules, an .EXE file can contain multiple code segments. Each BASIC module has a single code segment, and each of these can be as large as 64K. In fact, dividing a program into separate files is the *only* way to exceed the usual 64K code size limitation. Although using a separate source file for each subprogram makes sense in many situations, there is one slight disadvantage. When all of the various program modules are linked together, each separate module adds approximately 100 bytes of overhead. None the less, for all but the smallest programming projects, the advantages of using separate modules will probably outweigh the slight increase in code size. INCLUDE FILES Another useful BASIC feature that can help you to create modular programs is the Include file. An Include file is a separate file that is read and processed by BASIC at a specified place in your program. The statement '$INCLUDE: 'filename' tells QB or BC to add the statements in the named file to your source code, as if that code had been entered manually. If a file extension is not given, then .BAS is assumed. Many of the files that Microsoft provides with QuickBASIC use a .BI extension, which stands for "BASIC Include". Some programmers use .INC, and you may use whatever seems appropriate to the contents of the file. Include files are ideal for storing DECLARE, CONST, TYPE, and COMMON statements. Except for COMMON, none of these statements add to the size of your program, and none of them create any executable code. Therefore, you could create a single include file that is used for an entire project, and add an appropriate '$INCLUDE directive to the beginning of each program source file. Unused DECLARE and CONST statements and TYPE definitions are ignored by BASIC if they are not referenced. However, they do impinge slightly on available memory within the QuickBASIC editor, since BASIC has no way to know that they are not being used. Similarly, BC must keep track of the information in these statements as it compiles your program. But again, there is no impact on the size of your final executable program. In general, I recommend that you avoid placing any executable statements into an include file. Because the code in an include file is normally hidden from your view, it is easy to miss a key statement that is causing a bug. Likewise, a '$DYNAMIC or '$STATIC command hidden within an include file will obscure the true type of any arrays that are subsequently dimensioned. Perhaps worst of all is placing a DEFINT or other DEFtype statement there, for the same reason. QUICK LIBRARIES Quick Libraries contribute to modular programming in two important ways. Perhaps the most important use for a Quick Library is to allow access to subprograms and functions that are not written in BASIC. All DOS programs and subroutines--regardless of the language they were originally written in--end up as .OBJ files suitable for LINK to join together. But the QB and QBX editing environments manipulate BASIC source code, and interpret the commands rather than truly compile them. Therefore, the only way you can access a routine written in assembly language or C within QuickBASIC is by placing the routine into a Quick Library. Quick Libraries also let you store completed BASIC subprograms and functions out of the way from the rest of your program. If you have a large number of subroutines in one program, the list of names displayed when F2 is pressed can be very long and confusing. Since QuickBASIC does not display the routines in a Quick Library, there will be that many fewer names to deal with. Another advantage of placing pre-compiled BASIC routines into a Quick Library is that they can take less memory than when the BASIC source code is loaded as a module. This is true especially when you have many comments in the program, since comments are of course not compiled. Be aware that there are a few disadvantages to placing BASIC code into a Quick Library. One is that you cannot step and trace through the code, since it is not in its original BASIC source form. Another is that Quick Libraries are always stored in normal DOS memory, as opposed to expanded memory which QBX [and VB/DOS] can use. When a BASIC subprogram or function is less than 16K in size and EMS is present, QBX [and VB/DOS] will place its source code in expanded memory to free up as much conventional memory as possible. ERROR AND EVENT HANDLING ======================== As a BASIC programmer, there are several types of errors that you must deal with in a program. These errors fall into two general categories: compile errors and runtime errors. Compile errors are those that QB or BC issue, such as "Syntax error" or "Include file not found". Generally, these are easy to understand and correct, because the QuickBASIC editor places the cursor beneath the offending statement. In some cases, however, the error that is reported is incorrect. For example, if your program uses a function in a Quick Library that expects a string parameter and you forgot to declare it, BASIC reports a "Type mismatch" error. After all, with a statement such as X = FuncName%(Some$), how could BASIC know that FuncName% is not simply an integer array? Assuming that it is an array, BASIC rejects Some$ as being illegal for an element number. Runtime errors are those such as "File not found" which are issued when your program tries to open a file that doesn't exist, or is not in the specified directory. Other common runtime errors are "Illegal function call", "Out of string space", and "Input past end". Many of these errors can be avoided by an explicit test. If you are concerned that string space might be limited you can query the FRE("") function before dimensioning a dynamic string array. However, some errors are more difficult to anticipate. For example, to determine if a particular directory exists you must use CALL Interrupt to query a DOS service. The conventional way to handle errors is to use ON ERROR, and design an error handling subroutine. There are a number of problems with using ON ERROR, and most professional programmers try to avoid using it whenever possible. But ON ERROR does work, and it is often the simplest and most direct solution in many programs. The short listing below shows the minimum steps necessary to implement an error handler using ON ERROR. ON ERROR GOTO HandleErr FILES "*.XYZ" END HandleErr: SELECT CASE ERR CASE 53: PRINT "File not found" CASE 68: PRINT "Device unavailable" CASE 71: PRINT "Disk not ready" CASE 76: PRINT "Path not found" CASE ELSE: PRINT "Error number"; ERR END SELECT RESUME NEXT The statement ON ERROR GOTO HandleErr tells BASIC that if an error occurs, the program should jump to the HandleErr label. Without ON ERROR, the program would display an error message and then end. Since it is unlikely that you have any files with an .XYZ extension, BASIC will go to the error handler when this program is run. Within the error handling routine, the program uses the ERR function to determine the number of the error that occurred. Had line numbers been used in the program, the line number in which the error occurred would also be available with the ERL function. In this brief program fragment, the most likely error numbers are filtered through a SELECT CASE block, and any others will be reported by number. Regardless of which error occurred, a RESUME NEXT statement is used to resume execution at the next program statement. RESUME can also be used with an explicit line label or number to resume there; if no argument is given BASIC resumes execution at the line that caused the error. In many cases a plain RESUME will cause the program to enter an endless loop, because the error will keep happening repeatedly. In this case, the file will not exist no matter how many times BASIC tries to find it. Therefore, a plain RESUME is not appropriate following a "File not found" or similar error. Had the error been "Disk not ready", you could prompt the user to check the drive and then press a key to try again. In that case, then, RESUME would make sense. Although BASIC's ON ERROR can be useful, it does have a number of inherent limitations. Perhaps the worst problem with ON ERROR is that it often increases the program's size. When you use RESUME NEXT, you must also use the /x compile switch. Unfortunately, /x adds internal address labels to show where each statement begins, so the RESUME statement can find the line that caused the error. These labels are included within the compiled code and therefore increases its size. Another problem with ON ERROR is that it can hide what is really happening in a program. I recommend strongly that you REM out all ON ERROR statements while working in the QuickBASIC editing environment. Otherwise, an Illegal function call or other error may cause QuickBASIC to go to your error handler, and that handler might ignore it if the error is not one you were expecting and testing for. If that happens and your program uses RESUME NEXT, you might never even know that an error occurred! Yet another problem with ON ERROR is that it's frankly a clumsy way to program. Most languages let you test for the success or failure of the most recent operation, and act on or ignore the results at your discretion. Pascal, for example, uses the IOResult function to indicate if an error occurred during the last input or output operation. Finally, BASIC generates errors for many otherwise proper circumstances, such as the FILES statement above. You might think that if no files were found that matched the .XYZ extension given, then BASIC would simply not display anything. Indeed, an important part of toolbox products such as Crescent Software's QuickPak Professional are the routines that replace BASIC's file handling statements. By providing replacement routines that let you test for errors without an explicit ON ERROR statement, an add-on library can help to improve the organization of your programs. As I mentioned earlier, some errors can be avoided by using CALL Interrupt to access DOS directly. (One important DOS service lets you see if a file exists before attempting to open it.) But critical errors such as those caused by an open drive door require assembly language. In Chapter 12 you will learn how to bypass BASIC and access DOS directly using CALL Interrupt. EVENT HANDLING BASIC includes several forms of event handling, and like ON ERROR, these too are avoided when possible by many professional programmers. Event handling lets your programs perform a GOSUB automatically and without any action on your part, based on one or more conditions. Some of the more commonly used event statements are ON KEY, ON TIMER, and ON COM. With ON KEY, you can specify that a particular key or combination of keys will temporarily halt the program, and branch to a GOSUB routine designated as the ON KEY handler. ON TIMER is similar, except it performs a GOSUB at regular intervals based on BASIC's TIMER function. Likewise, ON COM performs a GOSUB whenever a character is received at the specified communications port. The concept of event handling is very powerful indeed. For example, ON COM allows your program to go about its business, and also handle characters as they arrive at the communications port. ON TIMER lets you simulate a crude form of multi-tasking, where control is transferred to a separate subroutine at one second intervals. Unfortunately, BASIC's event handling is not truly interrupt driven, and the resulting code to implement it adds considerably to a program's size. When any of the event handling methods are used, BASIC calls an interval event dispatcher periodically in your program. These calls add five bytes apiece, and one is added at either every statement, or at every labeled statement [depending on whether you compiled using /v or /w respectively]. This can increase your program's size considerably. Even worse, the repeated calls have an adverse effect on the speed of most programs. Like ON ERROR, BASIC's event handling statements provide a simple solution that is effective in many programming situations. And also like ON ERROR, they are best avoided in important programming projects. Using purely BASIC techniques, the only alternative to event trapping is polling. Polling simply means that your program manually checks for events, instead of letting BASIC do it automatically. The primary advantage of polling is that you can control when and where this checking occurs. The disadvantage is that it requires more effort by you. To see if any characters have been received from a communications port but are still waiting to be read you would use the LOF function. And to see if a given amount of time has elapsed you must query the TIMER function periodically. If true interrupt driven event handling were available in BASIC, that would clearly be preferable to either of the two available methods. However, only with Crescent's P.D.Q. product can such capability be added to a BASIC program. PROGRAMMING STYLE Programming style is a personal issue, and every programmer develops his or her own particular methods over time. Some aspects of programming style have little or no impact on the quality of the final result. For example, the number of columns you indent a FOR/NEXT loop will not affect how quickly a sort routine operates. But there are style factors that can help or harm your programs. One is that clearly commenting your code will help you to understand and improve it later. Another is when more than one programmer is working on a large project simultaneously. If neither programmer can figure out what the other is doing, the program's quality will no doubt suffer. Clearly, no one can or even should try to force a particular style or ideology upon you. However, I would like to share some of the decisions that I have made over the years, and explain why they make sense to me. Of course, you are free to use or not use these opinions as you see fit. Programmers are as unique and varied as any other discipline, and no one set of rules could possibly serve everyone equally. Whatever conventions you settle upon, be consistent above all else. The most important convention that I follow is to use DEFINT A-Z as the first statement in every program. For me, using integers verges on religion, and my fingers could type DEFINT even if I were asleep. As I have stated repeatedly, integers should be used whenever possible, unless you have a compelling reason not to. Integers are much faster and smaller than any other variable type BASIC offers. Nearly all of the available third party add-on products use integers parameters wherever possible, and so should the routines you write. The only reasonable exception to this is when writing financial or scientific programs, or other math-intensive applications. Equally important is adding sufficient and appropriate comments. Some programmers like to use comment headers that identify each related block of code; others prefer to comment every line. I recommend doing both, especially if other people will be reading your programs. I also prefer using an apostrophe as a comment delimiter, rather than the more formal REM. There are only so many columns available for each comment line, and it seems a shame to waste the space REM requires. When writing a subprogram or function that you plan to use again in other projects, include a complete heading comment that shows the purpose of the routine and the parameters it expects. If each parameter is listed neatly at the beginning of the file, you can create a hardcopy index of routines by printing that section of each file. Avoid comments that are obvious or redundant, such as this: Count = Count + 1 'increment Count If Count is keeping track of the number of lines read from a file, a more appropriate comment would be 'show that another line was read. Also avoid comments that are too cute or flip. Simply state clearly what is happening so you will know what you had in mind when you come back to the program next month or next year. Selecting meaningful variable names is equally valuable in the overall design of a program. If you are keeping track of the current line in a file, use a variable name such as CurLine. Although BASIC in some cases lets you use a reserved word as a variable name, I recommend against that. Over the years, different versions of BASIC have allowed or disallowed different keywords for variables. While QuickBASIC 4.5 lets you use Name$ as a variable, there is no guarantee that the next version will. Also, be aware that variables names which begin with the letters Fn are illegal, because BASIC reserves that for user-defined functions. Using the variable FName$ to hold a file name may look legal, but it isn't. Don't be ashamed to use GOTO when it is appropriate. There are many places where GOTO is the most direct way to accomplish something. As I showed earlier in this chapter, GOTO when used correctly can sometimes produce smaller and faster code than any other method. Use line labels instead of line numbers. The statement GOSUB 1020 doesn't provide any indication as to what happens at line 1020. GOSUB OpenFile, on the other hand, reads like plain English. The only exception to this is when you are debugging a program that crashes with the message "Illegal function call at line no line number". In that case, you should *add* line numbers to your program and run it again. A program that reads a source file and prints each line to another file with sequential numbers is trivial to write. I will also discuss debugging in depth in Chapter 4. Even though using DEFINT is supposed to force all subsequent CONST, DEF FN, and FUNCTION declarations to be integer, a bug in QuickBASIC causes untyped names to occasionally assume the single precision default. Therefore, I always use an explicit percent sign (%) to establish each function's type. In fact, I use whatever type identifier is appropriate for functions and CONST statements, to make them easily distinguishable in the program listing. For example, in the statement IF CurRow > MaxRows% THEN CurRow = MaxRows%, I know that MaxRows% has been defined as a constant. Some people prefer to use all upper-case letters for constants, though I prefer to reserve upper case for BASIC keywords. Although BASIC supports the optional AS INTEGER and AS SINGLE directives when defining a subprogram or function, that wastes a lot of screen space. I greatly prefer using the variable type identifiers. That is, I will use SUB MySub(A%, B!) rather than SUB MySub(A AS INTEGER, B AS SINGLE). The same information is conveyed but with a lot less effort and screen clutter. A well-behaved subroutine will restore the PC to the state it was when called. If you have subprogram that prints a string centered on the bottom line of the screen, use CSRLIN and POS(0) to read the current cursor location before you change it. Then restore the cursor before you exit. I like to indent two spaces within FOR/NEXT and IF/THEN blocks. Although some people prefer indenting four or even eight columns for each level, that can quickly get out of hand when the blocks are deeply nested. Nothing is harder to read than code that extends beyond the edge of the screen. But whatever you do, please *do not* change the tab stop settings in the QuickBASIC editor, unless you are the only one who will ever have to look at your code. Even though the program may look fine on your screen, the indentation will be completely wrong on everyone else's PC. When creating a dynamic array I prefer REDIM to a previous '$DYNAMIC statement. REDIM is clearer because it shows at the point in the source where the array is dimensioned that this is a dynamic array. Otherwise you have to scan backwards through your source code looking for the most recent '$DYNAMIC or '$STATIC, to see what type of array it really is. By the same token, using ever-changing DEFtype statements throughout your code is poor practice. Further, if a variable is a string, always include the dollar sign ($) suffix when you reference it. If you use DEFSTR S or even worse, DIM xxx AS STRING and then omit the dollar sign, nobody else will understand your program. I also prefer to explicitly dimension all arrays, and not let BC create them with the 11-element default (including element zero). If you need less than 11 elements, the memory is wasted. And if you need more, then your program will behave unpredictably. Not dimensioning every array is sloppy programming. Period. Avoid repeated calls to BASIC's internal functions if possible. In the listing below, the first example creates 61 bytes of code, while the second generates only 46 bytes. Not recommended: IF CSRLIN = 1 OR CSRLIN = 6 OR CSRLIN = 12 THEN ... END IF Much better: Temp = CSRLIN IF Temp = 1 OR Temp = 6 OR Temp = 12 THEN ... END IF As I stated earlier in this chapter, using SELECT CASE instead of IF will also eliminate this problem. Many BASIC statements are translated into calls, and each call takes a minimum of five bytes. Your programs will be easier to read if you evaluate temporary expressions separately. Even though BASIC lets you nest parentheses to nearly any level, nothing is gained by packing many expressions into a single statement. In the examples below that strip the extension from a file name, the first creates only a few bytes less code. Although this may seem counter to the other advice I have given, a slight code increase is often more than offset by a commensurate improvement in clarity. File$ = LEFT$(File$, INSTR(File$, ".") - 1) Dot = INSTR(File$, ".") File$ = LEFT$(File$, Dot - 1) The last issue I want to discuss is how to pronounce BASIC keywords and variable names. Don't laugh, but many programmers have no idea how to communicate the words LEFT$ or VARSEG over the telephone. Some people say "X dollar" for X$ even though "X string" is so much easier to say. Another keyword that's hard to verbalize is VARPTR. I prefer "var pointer" since it is, after all, a pointer function. CHR$(13) is pronounced "character string thirteen", again because that's the clearest and most straight forward interpretation. Likewise, INSTR is pronounced "in string" and LEFT$ would be said as "left string". If you're not sure how to pronounce something, use the closest equivalent English wording you can think of. SUMMARY In this chapter you have learned how BASIC's control flow statements are constructed, and how the compiler-generated code is similar regardless of which statements are used. You also learned where GOSUB and GOTO should be used, and when subprograms and functions are more appropriate. The discussion on logical operations showed how AND, OR, EQV, and XOR operate, and how they can be used to advantage in your programs. I have explained in detail exactly what recursion is, and how recursive subroutines can perform services that are not possible using any other technique. You have also learned about the importance of the stack in recursive and other non-static subroutines. Passing parameters to subprograms and functions has also been described in detail, along with some of the principles of modular program and event handling. Finally, I have shared with you some of my own personal preferences regarding programming style, and when and how such conventions can make a difference. Although this is a personal issue, I firmly believe it is important to develop a consistent style and stick with it. In Chapter 4 you will learn debugging methods using both the QuickBASIC editing environment and Microsoft's CodeView debugger. The successful design of a program is but one part of its development. Once it has been written, it must also be made to work correctly and reliably. As you will learn, there are many techniques that can be used to identify and correct common programming errors.